Skip to content

Fix lazy If branch pruning and skipped-parent handling in graph runtime#9079

Open
JPPhoto wants to merge 3 commits intoinvoke-ai:mainfrom
JPPhoto:fix-if-branch-edge-pruning
Open

Fix lazy If branch pruning and skipped-parent handling in graph runtime#9079
JPPhoto wants to merge 3 commits intoinvoke-ai:mainfrom
JPPhoto:fix-if-branch-edge-pruning

Conversation

@JPPhoto
Copy link
Copy Markdown
Collaborator

@JPPhoto JPPhoto commented Apr 20, 2026

Summary

Fixes lazy If execution in invokeai/app/services/shared/graph.py for workflows where a selected branch shares ancestors with the unselected branch.

The runtime bug was that resolving an If node decremented indegree for the pruned input but left the pruned execution edge in place, so later parent completion could decrement the same dependency again. This could make the If node become ready too early, underflow indegree, or propagate None into required downstream inputs.

This PR:

  • prunes unselected If input edges from the execution graph during branch resolution
  • ignores skipped prepared exec nodes when matching live parents for downstream materialization
  • tolerates missing result payloads from skipped branch-local exec nodes during If input hydration
  • adds focused backend regression tests for positive and negative branch-selection cases
  • updates invokeai/app/services/shared/README.md to document the runtime behavior

Related Issues / Discussions

Complex workflow failure involving a single If node feeding positive_text_conditioning on flux_denoise.

QA Instructions

Load a workflow where collect -> if -> required downstream input is used and verify:

  • the selected branch output reaches the downstream node
  • branch-exclusive unselected ancestors are skipped
  • no KeyError, indegree underflow, or missing required-connection error occurs from the If runtime path

Merge Plan

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • ❗Changes to a redux slice have a corresponding migration
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

@github-actions github-actions Bot added python PRs that change python files services PRs that change app services python-tests PRs that change python tests labels Apr 20, 2026
@JPPhoto JPPhoto moved this to 6.13.x Theme: MODELS in Invoke - Community Roadmap Apr 20, 2026
@JPPhoto JPPhoto force-pushed the fix-if-branch-edge-pruning branch 3 times, most recently from 63e5b16 to f681759 Compare April 22, 2026 00:56
@JPPhoto JPPhoto requested a review from Pfannkuchensack April 22, 2026 10:48
@JPPhoto JPPhoto force-pushed the fix-if-branch-edge-pruning branch from f681759 to 3785efb Compare April 23, 2026 15:57
@Pfannkuchensack Pfannkuchensack self-assigned this Apr 25, 2026
@Pfannkuchensack
Copy link
Copy Markdown
Collaborator

Findings

Low: invokeai/app/services/shared/graph.py:781-795 _prepare_if_inputs silently swallows the case where a SELECTED branch's source has no entry in results and uses the field's default (typically None). This is a defensive fallback documented in invokeai/app/services/shared/README.md:182-184, but the synthetic test in tests/test_graph_execution_state.py:856-883 (test_prepare_if_inputs_ignores_selected_branch_sources_without_results) only constructs the state by hand (g.executed.add(true_value_exec_id) without writing to g.results); no test exercises a real scheduler trajectory that produces a selected-branch source skipped-but-executed state. The risk is that a future scheduler bug that incorrectly marks a selected-branch source as skipped now propagates None to downstream consumers as silent data loss instead of surfacing as KeyError. To expose this issue, add a test that constructs an actual graph + scheduling sequence (for example interleaved/nested IfInvocation nodes whose branch-exclusivity classifications cause a selected-branch source to be skipped through mark_exec_node_skipped) and asserts whether IfInvocation.value should reasonably be None there. If no such trajectory can be built, document explicitly that this branch is unreachable and replace the silent fallback with a precise RuntimeError so a future regression is visible.

Low: invokeai/app/services/shared/graph.py:366-371 _get_collect_iteration_mappings reads source_prepared_mapping[source_node_id] directly and does NOT apply the new "skipped" filter that _get_prepared_nodes_for_source now applies at line 426-431. Any CollectInvocation whose parent has skipped prepared exec nodes would still wire those skipped exec nodes as inputs, and _build_collect_collection -> _get_copied_result_value (line 747-748) crashes on missing results. In the current branch-exclusivity classification (_node_outputs_stay_in_branch, lines 143-151), a Collect with a skipped parent should itself be classified branch-exclusive, so this is theoretically not reachable; but the asymmetry between Collect parent resolution and other node parent resolution is the kind of inconsistency that breaks the next time the exclusivity logic is refined. To expose this issue, add a test that constructs a graph where a CollectInvocation's item parent is on the unselected branch of an IfInvocation (forcing the parent to be skipped) while the Collect itself is reachable, and asserts the engine does not crash; or harden _get_collect_iteration_mappings to apply the same filter.

Low: invokeai/app/services/shared/graph.py:480-499 get_iteration_node short-circuits with if len(prepared_nodes) == 1: return next(iter(prepared_nodes)) after the new skipped-exec filter. With multiple iterations of an outer iterator, it is possible for exactly one iteration's prepared exec to be live (others skipped) and the function returns that single live exec node WITHOUT verifying it actually matches the caller's prepared_iterator_nodes / parent iteration combination. The two new tests test_get_iteration_node_returns_single_active_prepared_exec_node and test_get_iteration_node_ignores_skipped_prepared_exec_nodes both pass [] for prepared_iterator_nodes, so neither covers the case where iteration-path matching matters. To expose this issue, add a test with an iterator whose body contains an If, run the iterator with iteration_count > 1, resolve different iterations to different branches so only one iteration's branch source remains live, and assert that downstream consumers for the OTHER iteration are not incorrectly bound to the surviving live exec node.

Low: invokeai/app/services/shared/graph.py:255-266 mark_exec_node_skipped calls self._state.executed.add(exec_node_id) and self._state._set_prepared_exec_state(exec_node_id, "skipped") unconditionally. If a node was previously executed normally (entry in results, state == "executed") and a separately-resolved IF later classifies the same prepared exec as part of an unselected branch, the metadata flips to state == "skipped" while results[exec_id] still holds the real output. After the new filter in _get_prepared_nodes_for_source (line 426-431), this exec node would suddenly become invisible to downstream materialization even though its output was real. This requires multiple IfInvocation nodes with overlapping branch-exclusive classifications and is unlikely under the conservative _prune_nonexclusive_branch_nodes algorithm, but no test pins the contract. To expose this issue, add a test that builds two IfInvocation nodes whose true-branch-exclusive sets overlap on a single source node and checks behavior when one IF resolves first (producing results) and the second IF resolves the other way and tries to skip the same prepared exec.

Open Questions

The new fallback in _prepare_if_inputs (graph.py:781-795) is justified in invokeai/app/services/shared/README.md:182-184 as covering "selected branch's upstream exec node was skipped and therefore produced no runtime output". I could not construct a real scheduling trajectory, given the current _node_outputs_stay_in_branch classifier, where this happens for a SELECTED branch. If the authors have a concrete reproducer, that scenario should be added as a regression test; if not, the silent-None fallback should be replaced with an assertion so the unreachable branch becomes self-policing.

invokeai/app/services/shared/graph.py:195-201 _prune_unselected_if_inputs now mutates execution_graph by deleting edges. Other observers of the execution graph (event emitters, persistence/serialization, debugging UI, replay logic) may rely on the original edge set being preserved across an execution. I did not check whether GraphExecutionState (which is persisted as JSON, see model_config at line 1843-1857) is round-tripped during a run; if it is, a snapshot taken after pruning has structurally different edges from one taken before. This is not necessarily a defect but is a contract change worth confirming.

I had the problem that the UI didn't know that the graph had finished rendering. But I saw that you already addressed this in another PR?

@JPPhoto
Copy link
Copy Markdown
Collaborator Author

JPPhoto commented Apr 25, 2026

Low: invokeai/app/services/shared/graph.py:781-795 _prepare_if_inputs silently swallows the case where a SELECTED branch's source has no entry in results and uses the field's default (typically None). This is a defensive fallback documented in invokeai/app/services/shared/README.md:182-184, but the synthetic test in tests/test_graph_execution_state.py:856-883 (test_prepare_if_inputs_ignores_selected_branch_sources_without_results) only constructs the state by hand (g.executed.add(true_value_exec_id) without writing to g.results); no test exercises a real scheduler trajectory that produces a selected-branch source skipped-but-executed state. The risk is that a future scheduler bug that incorrectly marks a selected-branch source as skipped now propagates None to downstream consumers as silent data loss instead of surfacing as KeyError. To expose this issue, add a test that constructs an actual graph + scheduling sequence (for example interleaved/nested IfInvocation nodes whose branch-exclusivity classifications cause a selected-branch source to be skipped through mark_exec_node_skipped) and asserts whether IfInvocation.value should reasonably be None there. If no such trajectory can be built, document explicitly that this branch is unreachable and replace the silent fallback with a precise RuntimeError so a future regression is visible.

Low: invokeai/app/services/shared/graph.py:366-371 _get_collect_iteration_mappings reads source_prepared_mapping[source_node_id] directly and does NOT apply the new "skipped" filter that _get_prepared_nodes_for_source now applies at line 426-431. Any CollectInvocation whose parent has skipped prepared exec nodes would still wire those skipped exec nodes as inputs, and _build_collect_collection -> _get_copied_result_value (line 747-748) crashes on missing results. In the current branch-exclusivity classification (_node_outputs_stay_in_branch, lines 143-151), a Collect with a skipped parent should itself be classified branch-exclusive, so this is theoretically not reachable; but the asymmetry between Collect parent resolution and other node parent resolution is the kind of inconsistency that breaks the next time the exclusivity logic is refined. To expose this issue, add a test that constructs a graph where a CollectInvocation's item parent is on the unselected branch of an IfInvocation (forcing the parent to be skipped) while the Collect itself is reachable, and asserts the engine does not crash; or harden _get_collect_iteration_mappings to apply the same filter.

Low: invokeai/app/services/shared/graph.py:480-499 get_iteration_node short-circuits with if len(prepared_nodes) == 1: return next(iter(prepared_nodes)) after the new skipped-exec filter. With multiple iterations of an outer iterator, it is possible for exactly one iteration's prepared exec to be live (others skipped) and the function returns that single live exec node WITHOUT verifying it actually matches the caller's prepared_iterator_nodes / parent iteration combination. The two new tests test_get_iteration_node_returns_single_active_prepared_exec_node and test_get_iteration_node_ignores_skipped_prepared_exec_nodes both pass [] for prepared_iterator_nodes, so neither covers the case where iteration-path matching matters. To expose this issue, add a test with an iterator whose body contains an If, run the iterator with iteration_count > 1, resolve different iterations to different branches so only one iteration's branch source remains live, and assert that downstream consumers for the OTHER iteration are not incorrectly bound to the surviving live exec node.

Low: invokeai/app/services/shared/graph.py:255-266 mark_exec_node_skipped calls self._state.executed.add(exec_node_id) and self._state._set_prepared_exec_state(exec_node_id, "skipped") unconditionally. If a node was previously executed normally (entry in results, state == "executed") and a separately-resolved IF later classifies the same prepared exec as part of an unselected branch, the metadata flips to state == "skipped" while results[exec_id] still holds the real output. After the new filter in _get_prepared_nodes_for_source (line 426-431), this exec node would suddenly become invisible to downstream materialization even though its output was real. This requires multiple IfInvocation nodes with overlapping branch-exclusive classifications and is unlikely under the conservative _prune_nonexclusive_branch_nodes algorithm, but no test pins the contract. To expose this issue, add a test that builds two IfInvocation nodes whose true-branch-exclusive sets overlap on a single source node and checks behavior when one IF resolves first (producing results) and the second IF resolves the other way and tries to skip the same prepared exec.

I addressed these review concerns with tests first. The selected-branch missing-result path now fails loudly instead of silently propagating None in invokeai/app/services/shared/graph.py, and the README now states that this path should be unreachable in normal scheduling in invokeai/app/services/shared/README.md. Collect parent mapping now filters skipped prepared exec nodes in graph.py, single-live-node iterator matching now still verifies iterator ancestry in graph.py, and mark_exec_node_skipped() no longer hides already executed results by flipping executed nodes to skipped in graph.py.

Tests added/updated in tests/test_graph_execution_state.py cover:

  • selected If branch with missing result now raising
  • collect mapping ignoring skipped prepared parents
  • iterator-path mismatch not reusing the only surviving live exec from another iteration
  • executed nodes remaining visible even if later considered for skipping

Open Questions

The new fallback in _prepare_if_inputs (graph.py:781-795) is justified in invokeai/app/services/shared/README.md:182-184 as covering "selected branch's upstream exec node was skipped and therefore produced no runtime output". I could not construct a real scheduling trajectory, given the current _node_outputs_stay_in_branch classifier, where this happens for a SELECTED branch. If the authors have a concrete reproducer, that scenario should be added as a regression test; if not, the silent-None fallback should be replaced with an assertion so the unreachable branch becomes self-policing.

The concern about silent None fallback in invokeai/app/services/shared/graph.py is fair. There's not a demonstrated normal scheduler path where a SELECTED branch input should be missing because its upstream exec node was skipped. The current test in invokeai/tests/test_graph_execution_state.py is a defensive synthetic-state test, not a real scheduler reproducer. I addressed the first point by narrowing the fallback documentation and code comment so they no longer imply this is a normal selected-branch scheduler path.

invokeai/app/services/shared/graph.py:195-201 _prune_unselected_if_inputs now mutates execution_graph by deleting edges. Other observers of the execution graph (event emitters, persistence/serialization, debugging UI, replay logic) may rely on the original edge set being preserved across an execution. I did not check whether GraphExecutionState (which is persisted as JSON, see model_config at line 1843-1857) is round-tripped during a run; if it is, a snapshot taken after pruning has structurally different edges from one taken before. This is not necessarily a defect but is a contract change worth confirming.

My plan:

  • Keep the current pruning behavior for now.
  • Do not treat this as a retry/resume correctness bug.
  • Do treat it as a session-snapshot contract change and document it.

Why I think that is the right course:

  • Saved sessions are explicitly not used to drive live execution:
    • invokeai/app/services/session_queue/session_queue_sqlite.py:721-723
  • Retry does not reuse the persisted execution_graph; it rebuilds a fresh GraphExecutionState from the original source graph only:
    • invokeai/app/services/session_queue/session_queue_sqlite.py:977
  • The frontend code I found uses session.results and session.source_prepared_mapping, not session.execution_graph, for output handling.

So the mutation is real, but the impact appears limited to inspection/debug snapshots of failed/completed sessions, not runtime behavior or retry semantics. If you know of any real consumers of the execution graph, then I can separate
"pruned dependency bookkeeping" from the persisted execution_graph, for example via a private runtime-only
suppression set or a non-serialized field.

I had the problem that the UI didn't know that the graph had finished rendering. But I saw that you already addressed this in another PR?

Correct!

@Pfannkuchensack
Copy link
Copy Markdown
Collaborator

Optional Hardening Notes (non-blocking)

These are nice-to-haves on top of an already solid PR — none are defects, just polish.

1. Document idempotency of mark_exec_node_skipped

invokeai/app/services/shared/graph.py:255-269

The new early-return only short-circuits on state == "executed". Calling the function a second time on a node that is already "skipped" falls through and re-runs _remove_from_ready_queues, _set_prepared_exec_state, and the source-completion check. The result is idempotent in practice, but the asymmetry between the two terminal states ("executed" short-circuits, "skipped" re-runs) is subtle.

Optional: add a one-line comment noting that re-entry on "skipped" is intentionally idempotent, or short-circuit on either terminal state for symmetry.

2. Minor cost in get_iteration_node trivial path

invokeai/app/services/shared/graph.py:480-504

The single-prepared-node shortcut now always computes _get_parent_iterator_exec_nodes before returning, even when prepared_iterator_nodes is empty (in which case _matches_parent_iterators is vacuously true). The previous version returned in O(1) for that case.

Optional: skip the iterator-graph work when prepared_iterator_nodes is empty. Negligible impact unless this is on a hot path.

3. Improve diagnostics on _prepare_if_inputs defensive raise

invokeai/app/services/shared/graph.py:793-797

The new RuntimeError("IfInvocation selected input edge points at an exec node with no stored result output") correctly fails loud, but the message lacks the offending if_node.id, the edge source id, and the iteration path. If this branch ever does fire in production, those three values would shorten triage considerably.

Optional: include node.id, edge.source.node_id, and the iteration path in the message.

4. Tighten test_mark_exec_node_skipped_does_not_hide_already_executed_results

tests/test_graph_execution_state.py:954-967

The test seeds g.results[exec_id] via AnyTypeTestInvocation(...).invoke(Mock(InvocationContext)) and then asserts g.results[exec_id].value == "value". It works today because AnyTypeTestInvocation.invoke happens to return an output exposing .value, but the seam is incidental — a future change to that test invocation could turn this into a passing-but-misleading assertion.

Optional: construct the BaseInvocationOutput directly instead of routing through .invoke(Mock(...)).

@JPPhoto
Copy link
Copy Markdown
Collaborator Author

JPPhoto commented Apr 25, 2026

@Pfannkuchensack I finished the cleanup pass.

In invokeai/app/services/shared/graph.py, mark_exec_node_skipped() now short-circuits on both terminal states, so skipped re-entry is explicitly idempotent. In invokeai/app/services/shared/graph.py, get_iteration_node() restores the trivial fast path when there are no prepared iterator constraints. In invokeai/app/services/shared/graph.py, the defensive If raise now includes if_exec_id, source_exec_id, and iteration_path.

I also tightened the tests in tests/test_graph_execution_state.py: the missing-result If test now asserts the richer diagnostics, the executed-result test now uses AnyTypeTestInvocationOutput directly instead of going through .invoke(...), and there is a new idempotency test for repeated skip marking.

Copy link
Copy Markdown
Collaborator

@Pfannkuchensack Pfannkuchensack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python PRs that change python files python-tests PRs that change python tests services PRs that change app services v6.13.x

Projects

Status: 6.13.x Theme: MODELS

Development

Successfully merging this pull request may close these issues.

2 participants